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We discuss the detection of gravitational-wave backgrounds in the context of Bayesian inference 
and suggest a practical definition of what it means for a signal to be considered stochastic—namely, 
that the Bayesian evidence favors a stochastic signal model over a deterministic signal model. A 
signal can further be classified as Gaussian-stochastic if a Gaussian signal model is favored. In 
our analysis we use Bayesian model selection to choose between several signal and noise models for 
simulated data consisting of uncorrelated Gaussian detector noise plus a superposition of sinusoidal 
signals from an astrophysical population of gravitational-wave sources. For simplicity, we consider 
co-located and co-aligned detectors with white detector noise, but the method can be extended 
to more realistic detector configurations and power spectra. The general trend we observe is that 
a deterministic model is favored for small source numbers, a non-Gaussian stochastic model is 
preferred for intermediate source numbers, and a Gaussian stochastic model is preferred for large 
source numbers. However, there is very large variation between individual signal realizations, leading 
to fuzzy boundaries between the three regimes. We find that a hybrid, trans-dimensional model 
comprised of a deterministic signal model for individual bright sources and a Gaussian-stochastic 
signal model for the remaining confusion background outperforms all other models in most instances. 
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I. INTRODUCTION 

A stochastic background of gravitational radiation is 
usually defined as a random gravitational-wave signal 
produced by a large number of weak, independent, and 
unresolved sources. It can be of either astrophysical or 
cosmological origin. The signal is random in the sense 
that it can be characterized only statistically, in terms of 
expectation values of the Fourier components of a plane- 
wave expansion of the metric perturbations. For a suf¬ 
ficiently large number of independent sources, the back¬ 
ground will be Gaussian by the central limit theorem. 
Knowledge of the first two moments of the distribution 
will then sufhce to determine all higher-order moments, 
meaning that the quadratic expectation values (or co- 
variance matrix) of the Fourier components completely 
define a Gaussian background of gravitational radiation. 
For non-Gaussian backgrounds, the only difference is that 
the probability distribution of the Fourier components is 
no longer Gaussian. Thus, third and/or higher-order mo¬ 
ments of the distribution are now required. 

Although there is general agreement with the above 
definition, there has been some confusion and/or dis¬ 
agreement about some of the defining properties of a 
stochastic background, in particular, related to the re¬ 
solvability of a signal and its relationship to duty-cycle 
W- In order to avoid such confusion in this paper, we 
give operational definitions for these properties, framed 
in the context of Bayesian inference. For instance, we de¬ 
fine a signal to be stochastic if it is more parsimonious (in 
a Bayesian model selection sense) to search for that sig¬ 


nal using a stochastic signal model for the waveform than 
using a deterministic signal model. We also define a sig¬ 
nal to be resolvable if it can be decomposed into separate 
(i.e., non-overlapping in either time or frequency) and 
individually detectable signals, again in a Bayesian model 
selection sense. This definition of resolvability is more re¬ 
strictive than that of Rosado [1] , who defines a signal to 
be resolvable if it is separable, independent of detectabil¬ 
ity. With our definition, it is possible to have separable 
signals that are not detectable (e.g., “subthreshold” low- 
duty cycle bursts [5] or non-overlapping low-SNR sinu¬ 
soids), and signals that are detectable but not resolvable 
(e.g., a Gaussian stochastic background integrated over 
a large enough time or large enough frequency band). 

In addition, for non-Gaussian backgrounds associated 
with the superposition of signals from many astrophys¬ 
ical sources, there will sometimes be cases where a few 
bright signals standout above the lower-amplitude “con¬ 
fusion” background. These resolvable deterministic sig¬ 
nals should be ‘subtracted’ from the data, leaving a resid¬ 
ual non-deterministic background whose statistical prop¬ 
erties we would like to determine. In the context of 
Bayesian inference, this ‘subtraction’ is done by allowing 
hybrid signal models, which consist of both parametrized 
deterministic signals and non-deterministic backgrounds. 
By using such models we can investigate the statistical 
properties of the residual background without the influ¬ 
ence of the resolvable signals. This is ultimately the prop¬ 
erty of the stochastic background that we would like to 
determine. 

The closely related question of whether a population 
of astrophysical signals is more likely to be first detected 
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via a stochastic cross-correlation analysis or a template- 
based search for individual signals has recently been con¬ 
sidered by Rosado et al. [5] in the context of pulsar tim¬ 
ing detection of the low-frequency, slowly-evolving signals 
from binary supermassive black holes, and by Mandel [7] 
in the context of ground-based interferometer detections 
of chirping neutron star binaries and continuous waves 
from non-axisymmetric spinning neutron stars. Rosado 
et al. found that pulsar timing arrays are most likely 
to first detect a stochastic background, though in some 
cases a bright and nearby source may be detected first. 
Mandel concluded that individual signals will always be 
detected first for non-overlapping, evolving signals, while 
a stochastic background may be detected first for non¬ 
evolving, overlapping signals, and gave conditions for 
when this might occur. Our results are broadly in agree¬ 
ment with these studies, though it is difficult to directly 
compare our results since we frame the problem in differ¬ 
ent ways and have different definitions for what it means 
for a signal to be considered deterministic or stochastic. 

In this paper, we apply Bayesian inference to non- 
Gaussian gravitational-wave backgrounds, which are pro¬ 
duced whenever the overlap of the gravitational-wave sig¬ 
nals in time-frequency space is sufficiently low that the 
central-limit theorem does not apply. Previous analy¬ 
ses for non-Gaussian backgrounds have typically been 
framed in the context of frequentist statistics, involv¬ 
ing modifications 015] of the standard optimally-filtered 
cross-correlation statistic 0 used to search for Gaussian 
backgrounds. Here we use Bayesian inference to address 
the same problem. We apply Bayesian model selection to 
compare several signal-|-noise models for simulated data 
consisting of uncorrelated Gaussian detector noise plus a 
superposition of sinusoidal signals from an astrophysical 
population of gravitational-wave sources. The analysis is 
done in the frequency domain since the signals we con¬ 
sider are well localized in frequency and spread out in 
time, but our results apply equally well to signals that 
are localized in time and spread out in frequency, such as 
a population of burst signals occurring with some Pois¬ 
son rate. For simplicity, we consider a pair of co-located 
and co-aligned detectors with white detector noise, but 
the method can be extended to more realistic detector 
configurations and power spectra. 

The general trend we observe from our simulations is 
that a deterministic signal model is favored whenever 
the number of sources contributing to the background 
is sufficiently small; a non-Gaussian stochastic model is 
preferred for an intermediate number of sources; and a 
Gaussian stochastic model is preferred for large source 
numbers. However, due to large variations between in¬ 
dividual signal realizations, the boundaries between the 
three regimes are not sharply defined. We find that a 
hybrid, trans-dimensional model comprised of a deter¬ 
ministic signal model for individual bright sources and a 
Gaussian-stochastic signal model for the remaining con¬ 
fusion background outperforms all other models in most 
instances. 


The remainder of the paper is organized as follows: In 
Sec. [H] we give a brief overview of Bayesian inference, and 
apply it to the specific case of non-Gaussian gravational- 
wave backgrounds in Sec. |HI| There we define the rele¬ 
vant noise and signal probability distributions, likelihood 
functions, prior and posterior probability distributions, 
etc. needed for our analysis. In Sec. |IV] we define the 
various signal-|-noise models that we use for the Bayesian 
model selection calculations, the results of which, for sim¬ 
ulated data, are described in detail in Sec. |V] Finally, in 
Sec. |V^ we discuss the relevance of the results in the 
context of current searches for gravitational-wave back¬ 
grounds. Appendix includes a discussion of different 
approaches for calculating Bayes factors. 


II. BAYESIAN INFERENCE - OVERVIEW 


Bayesian inference is a powerful tool for assessing the 
plausibility of hypotheses, given a set of observations and 
prior information [10]. It allows you to update your de¬ 
gree of belief in a particular hypothesis, based on how 
well the hypothesis (or model) fits the observed data. 
It also implements a quantitative version of Occam’s ra¬ 
zor [To], which says that given two models that fit the 
data equally well, the simpler model should be preferred. 
This result falls naturally out of a Bayesian model selec¬ 
tion calculation, where one calculates the posterior odds 
ratio of one model against another. If two models fit the 
data equally well but have different parameter space vol¬ 
umes, then the model with the larger parameter space 
volume is penalized by the ratio of the larger parameter 
space volume to the smaller volume. 

Using Bayesian inference to analyze a particular prob¬ 
lem is very simple in principle—one applies Bayes’ the¬ 
orem, cf. (20), to calculate posterior probability distri¬ 


butions given a likelihood function (which specifies the 
probabilty of the data given the model and the value of 
any parameters associated with it) and a prior proba¬ 
bility distribution for the model and its parameters. In 
practice, however, these calculations can be extremely 
computationally-intensive, especially for models having a 
large number of parameters. But in recent years, thanks 
to advances in high-speed computing and the develop¬ 
ment of efficient sampling algorithms [in na, integra¬ 
tions over model parameter spaces having hundreds or 
even thousands of dimensions are now possible. Thus, 
the use of Bayesian inference to solve diverse problems 
in the physical sciences has increased dramatically, given 
the ability to do numerical calculations, which, in the 
past, were not possible in practice. 

In particular, in the field of gravitational-wave data 
analysis, it is now common to see Bayesian inference 
used for: (i) detector noise estimation and model¬ 
ing [T3| [M], (ii) sky-localization of signals from un¬ 
modeled graviational-wave bursts, and (hi) parame¬ 
ter estimation for gravitational-wave signals associated 
with many different sources—binary inspiral events. 
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continuous-wave sources (e.g., non-axisymmetric rotat¬ 
ing neutron stars), and stochastic gravitational-wave 
backgrounds m of either astrophysical or cosmologi¬ 
cal origin. Although Bayesian inference and frequen- 
tist optimally-filtered statistic methods give equivalent 
results for sufficiently simple signal and noise models and 
simple choices for the priors, the Bayesian formalism al¬ 
lows one to more easily handle problems involving more 
complicated models and/or non-trivial priors. This is the 
case when the model contains so-called nuisance param¬ 
eters (such as non-negligible correlated noise), which are 
not of direct astrophysical interest, but nonetheless affect 
statistical statements about the signal parameters. For 
example, in the presence of correlated noise, the stan¬ 
dard optimally-filtered cross-correlation statistic [S] for 
isotropic gravitational-wave backgrounds no longer cor¬ 
responds to the optimal combination of the data from 
the two detectors. Calculating the maximum of the like¬ 
lihood function is more complicated for this case, with no 
analytic closed-form solution in general. But a Bayesian 
approach to this problem, which numerically explores the 
likelihood function using e.g., Markov Chain Monte Carlo 
(MCMC) methods, is a viable alternative. 

For gravitational-wave backgrounds generated by a su¬ 
perposition of signals from a population of astrophysi¬ 
cal sources, Bayesian inference is particularly convenient 
since it allows one to compare several viable signal-l-noise 
models. Depending on the number of sources emitting 
gravitational waves in a particular time-frequency vol¬ 
ume, the measured signal could be either: (i) stochas¬ 
tic and Gaussian distributed, (ii) stochastic but non- 
Gaussian, (iii) a superposition of individually resolvable 
signals, or (iv) some combination of both deterministic 
resolvable signals and a non-deterministic (i.e., stochas¬ 
tic) background. Using Bayesian model selection, we 
can rank these various models, and thus characterize the 
gravitational-wave component of the data. The following 
sections describe this procedure for the case of simulated 
data consisting of Gaussian white detector noise plus a 
superposition of sinusoidal signals from an astrophysical 
population of gravitational-wave sources. 


III. BAYESIAN INFERENCE APPLIED TO 
NON-GAUSSIAN BACKGROUNDS 


In this section we specify the various probability dis¬ 
tributions, likehood functions, posterior distributions, 
etc. that we will need in order to apply Bayesian in¬ 
ference to searches for non-Gaussian gravitational-wave 
backgrounds. Readers interested in more details regard¬ 
ing some of the calculations performed in this section 
should consult e.g., [TS] . 


A. Noise and signal probability distributions 

For simplicity, consider the simple case of N samples 
of data in a pair of co-located and co-aligned detectors: 


Si = ni 


S2 = n2 


( 1 ) 


where Si = [sn, si 2 , • • • etc. We will assume that 

the noise in each detector is Gaussian, white, and inde¬ 
pendent of one another, with zero mean and variance 


p„(n|6'„) = 


-i Cr^n 


^/det(27rC„) 


e 2 


where 


ni 

n2 


C„ = 


’’l IwxAf ^NxN 
^2 '^NxN 


OwxAf cr,!/ 


( 2 ) 


( 3 ) 


and On = {a’i,(T 2 }. The signal h, which is common to 
both detectors, is assumed to come from a probability 
distribution which need not be Gaussian. The 

probability distribution ph(h\0h) is called a parame¬ 
terized signal prior and 6^ are called hyperparameters 
[muH]. Examples of parameterized signal priors include: 


(i) Gaussian, white signal prior. 

N 


PhiHOh) = 


1 






rat 


( 4 ) 


where eh = {at}- 

(ii) Two-component Gaussian, white signal prior. 
1 


N 


Phih\9h) = 


i=l 


5 




+ (1-0 


1 






( 5 ) 


where 6^ = {^, a,/3}. The two-component Gaussian 
signal prior reduces to the Gaussian signal prior in the 
limit ^ 1. It reduces to the Drasco and Flanagan 

signal prior [8] in the limit /3 —>■ 0. The Drasco and 
Flanagan signal prior corresponds to Gaussian bursts 
with root-mean-square (rms) amplitude a and probabil¬ 
ity 0 ^ 1 - 

(iii) Non-standardized Student’s t-distribution signal 
prior. 


N 


Ph{h\eh) = 
where 


2 = 1 


F 



1- 

P 

1)0^ 

\ V J 


m = f 
^0 


dxx'^-^e-’^ 


( 6 ) 


( 7 ) 
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is the Gamma function and Oh = {I'^a}. The above 
distribution is an extension of the standard Student’s t- 
distribution, which includes a scaling parameter a in ad¬ 
dition to the number of degrees of freedom, i/ > 0 (real). 
(The t of Student’s t-distribution is given by hija.) The 
scaling parameter a is related to the variance of each hi 

by 


Here 0 = {0n,9h} denotes the combined set of noise and 
signal parameters. 


(i) For the Gaussian signal prior, we find: 


p{s\9) 


1 

•\/det(27rCs) 


c: 


(14) 


al = ^ , for > 2 . (8) 

For V —>■ oo, the non-standardized Student’s t- 
distribution becomes a Gaussian distribution with the 
above variance. 

(iv) Multi-sinusoid signal prior. 

Ph{h\9h)=6(h-h{9h)) , (9) 

M 

hi{0h) = ^Ai cos(27r//t, - ifi ), (10) 

7^1 

where i = and 9h = = 

1,2, ••• ,M}. Here M can take on any value between 
0 and Mniax) where M^ax is the maximum number of 
allowed sinusoids (e.g., M^ax = 100). This is a deter¬ 
ministic signal model corresponding to the superposition 
of M individually resolvable sinusoids. 

Although it is possible to write down more compli¬ 
cated non-Gaussian signal priors (since there are an in¬ 
finite number of ways for a signal to be non-Gaussian), 
for the analysis considered in this paper, we will restrict 
ourselves to those given above. 


where 


Cs — Cn + O'h 


'^NxN ^NxN 
^NxN ^NxN 


(15) 


The likelihood function given by (141 and (15) has 


the standard form used as the starting point for 
cross-correlation analyses for Gaussian stochastic back¬ 
grounds nails]. 

(ii) For the two-component Gaussian signal prior, 
we obtain a two-component Gaussian distribution for 
the margi naliz ed likelihood, with covariance matrices 
similar to (15), but with cr^ replaced by and by 
for the two components, respectively: 


p{s\9) = ^ 


1 


-i C“^s 


where 


i/detpTrCO 
+ (1-0 


: e 2 


A/det(27rC/3) 


e-2- S % (16) 


Ca = C„ 


IjVxAT ^NxN 
IjVxAT ^NxN 


(17) 


B. Likelihood functions 

To construct the likelihood function, we first adopt a 
waveform template h and form the residuals ri = Si — h 
and r 2 = S 2 — h. We demand that the residuals be con¬ 
sistent with the probability distribution for the noise (cf. 
(U), which gives rise to a multivariate Gaussian likeli¬ 
hood function for the data: 

p(s|e„,h) .p„(r|«„) = (II) 

where 

But since h are random variables for stochastic signal 
models or specified functions of the parameters 9h for 
deterministic signal models, we are not interested in the 
particular values of h, but rather in the values of the 
parameters 9h that define the signal prior ph(h\9h). We 
thus marginalize over h by performing the integral: 

p{s\9) = p{s\9n,9h) = y dhp(s|0„,h)p;i(h|0/,). (13) 


r = 


51 — h 

52 - h 


( 12 ) 


and similarly for C^. 


(iii) For the non-standardized Student’s t-distribution, 
the marginalization integrals are all of the form: 


1 (»1I-hi) 


dhi 


■ 



(18) 


Unfortunately, we do not know how to analytically 
evaluate such an integral. It is possible to consider an 
Edgeworth expansion of the Student t-distribution in 
terms of its non-zero cumulants, C 2 ,C 4 ,.... But then 
truncating the expansion after a finite number of terms 
would produce a different non-Gaussian distribution, 
that would behave differently in model comparision tests 
from the full Student’s t-distribution. Thus, if we want 
to use this distribution as one of our non-Gaussian signal 
models, we would need to evaluate the above integrals 
numerically. 


(iv) For the deterministic multi-sinusoid signal model, 
the marginalized likelihood is simply 


p(s|0) 


^ -i(s-h(eT))’^c-bs-h(eT)) ^91 

v/det(2^CJ ■ ^ ^ 
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C. Posterior distributions and Bayesian model 
selection 


Given the marginalized likelihood function p(s|0) and a 
prior probability distribution 7r(d) for the noise and signal 
parameters 9 = {d„, 9h}, we can use Bayes’ theorem 


P0\s) 


p{s\e)'K{e) 

f d9'p(sl9')7r(9') 


( 20 ) 


to calculate the joint posterior probability distribution 
for the parameters. The posterior distribution for a sub¬ 
set of the parameters is obtained by marginalizing over 
the other signal and noise parameters. For example, for 
the Gaussian signal model, the posterior distribution for 
the signal parameter ah is obtained by evaluating the 
following integral: 



2 In Bij{s) 

Evidence for model AIj relative to AIj 

< 1 

< 0 

Negative (supports model AIj) 

1-3 

0-2 

Not worth more than a bare mention 

3-12 

2-5 

Positive 

12-150 

5-10 

Stong 

> 150 

> 10 

Very strong 


TABLE I: Bayes factors and their interpretation in terms of 
the evidence in favor of one model relative to the other. 


the Bayes factor. If we fix some model, e.g., A4o, and 
calculate the Bayes factors of all the other models rela¬ 
tive to A4o, the model with the largest Bayes’ factor is 
the preferred model given the data. 

Table [I] gives a list of possible Bayes factor values and 
their interpretation in terms of the evidence in favor of 
one model relative to another. 


P(f^hls) 


da I 


da2 p{ai,a2,ah\s). 


( 21 ) 


D. Comparison to maximum-likelihood analyses 


Similar integrals will give the posterior distributions for 
ai and CT 2 . 

In a similar manner, we can calculate the posterior 
probability distribution for a signal-l-noise model Ai us¬ 
ing Bayes’ theorem in the form 


p(Mls) 


p(slM)7r(M) 

J^jp(slMi)7r(Mi) ■ 


( 22 ) 


The quantity p(s I AI) is called the evidence for model AI. 
It is just the likelihood function p(s|0, AI) marginalized 
over the parameter values 


p(s|AI) 


dep{s\9,M)7T{9\M), 


(23) 


where we have explicitly indicated the model dependence 
of both the prior and likelihood function. 

To compare two models Mi and Mj, we simply take 
the ratio of the posterior probability distributions for 
these two models: 

pjMils) ^ pis\Mi)n{Mi) 
p{Mj\s) p{s\Mj)Tr{Mj) ' 

Note that the the common factor ^j.p(s|AI/)7r(Al7) has 
canceled out when forming the ratio. The left-hand side 
of the above equation is the posterior odds ratio for model 
Mi relative to AIj; we see from this equation that it 
equals the prior odds ratio times the ratio of the evi¬ 
dences. This ratio of evidences is called the Bayes factor 
and is denoted by 


Bij{s) 


p{s\Mi) 
p{s\Mj) ■ 


(25) 


It is interesting to compare the Bayesian model selec¬ 
tion calculation discussed above to a maximum-likelihood 
frequentist analysis, e.g., that presented in [8]. There 
they construct a detection statistic by maximizing the 
likelihood ratio for a signal -I- noise model AIi to the noise- 
only model Mq: 

maxg- max^ n(s|6»„,6»?,,AIi) 

Aml(s)= " \ --. (26) 

max^^p(s|6i(j, AIo) 


The Bayes factor calculation also involves a ratio of two 
quantities, but instead of maximizing over the parame¬ 
ters, we marginalize over the parameters: 


Bio{s) 


p(s|AIi) 

p(s|AIo) 


(27) 


/ dOn f d9hp(sl9n,9h,Mi)7r(9n,9hlMi) 
f p{s\9'n, Mo)tt{9'JMq) 

(28) 


These two expressions can be related to one another by 
using the Laplace approximation to individually approx¬ 
imate the evidences p(s|AIi) and p(s|AIo). As shown in 
App.0 

p(s|AI) ~p(s|4il)^^ , (29) 

Vm 

where M/mIVm is the fraction of the parameter space 
volume for model AI needed to fit the data, and 0 ml 
are the values of the particular model parameters that 
maximize the likelihood. Thus, 

AVj /Vi 

gio(s) cs , (30) 


In many circumstances there is no a priori reason to pre¬ 
fer one model over another (i.e., the prior odds ratio is 
unity), so for these cases the posterior odds ratio is just 


which shows that the Bayes factor is proportional to the 
frequentist maximum-likelihood ratio. The proportion¬ 
ality constant is basically the Occam’s factor mentioned 
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in Sec. [TTj which penalizes a model if its parameter space 
volume is larger than necessary to fit the data. 


E. Signal-to-noise ratios 

One of the parameters that we will use to describe 
the simulations in Sec. is the ratio of the power in 
the injected signals to that of the detector noise. For 
a stochastic gravitational-wave background described by 
the one-sided strain power spectral density Sh{f), the 
expected signal-to-noise ratio of the optimally-filtered 
cross-correlation statistic in a pair of detectors I, J is 
given by pp] 


SNR^I ^ 

Istoch 


df 




. 1/2 


PnAf)Pnjif) 


(31) 


where Tij{f) is the overlap function between detectors 
I and J (see, e.g., m |22])- For a pair of identical, 
co-located and co-aligned detectors the above expression 
simplifies to 


SNR2 


stoch 



(32) 


where Ph{f) = ^ii{f)Sh{f) is the gravitational-wave 
power in a single detector. We are assuming here that the 
signal power is weak relative to the noise, so that the total 
power in detector I is given by Pi{f) = Phif) + Pn,{f) « 

Pu,{f)- 

For a deterministic signal described by the strain re¬ 
sponse h{f), it is often more convenient to work with the 
matched-filter signal-to-noise ratio, which has expected 
value 


SNR^ 


det 



IM/)P 

PnU) 



(33) 


For this case Ph{f) = is the one-sided power 

spectral density for the signal. Note that for determin¬ 
istic signals, the squared signal-to-noise ratio scales with 
the number of frequency bins Wins, while for stochastic 
signals it scales like \/-^bins- 


IV. SIGNAL+NOISE MODELS AND PRIORS 


All - Noise plus Gaussian stochastic model: 

White Gaussian detector noise plus a white Gaus¬ 
sian gravitational-wave background. There is one 
additional parameter corresponding to the variance 
of the background, so 9 = {(Ji, cr 2 , The 
prior on cr^ is also between 0 and 10 , just like the 
detector noise variances. 

AI 2 - Noise plus non-Gaussian (two-component) stochas¬ 
tic model: 

White Gaussian detector noise plus a white two- 
component Gaussian model for the gravitational- 
wave background. There are three parameters for 
the two-component Gaussian model: the variances 
o? and for the two components, and the proba¬ 
bility ^ of one of the components. (The probability 
of the other component necessarily equals I — ^.) 
Thus, 9 = {tTi, ( 72 , a,/3, ^}. The prior on ^ is flat 
from 0 to I. The prior on the variances are 0 to 10 
for the wide component and 0 to 0.5 on the narrow, 
delta-function-like component. 

AI 3 - Noise plus deterministic multi-sinusoid signal 
model: 

White Gaussian detector noise plus up to 100 de¬ 
terministic sinusoids. There are three parameters 
{Aj, fi,ipi} corresponding to the amplitude, fre¬ 
quency, and phase for each sinusoid. Thus, for M 
sinusoids, there are 2-I-3M pameters for this partic¬ 
ular model 9 = {ai,a 2 , Aj, fi, (pj\I = I, 2, • • • M}. 
The prior on the amplitudes is uniform in the range 
A G [0,1000], and the prior on the frequencies is 
uniform across range spanned by the data. The 
prior on the phases is uniform between 0 and 27r. 

AI 4 - Noise plus deterministic multi-sinusoid plus Gaus¬ 
sian background model: 

White Gaussian detector noise plus a Gaussian 
gravitational-wave background plus up to 100 sinu¬ 
soids. As for AI 3 , there are three parameters (am¬ 
plitude, frequency, and phase) for each sinusoid. 
Thus, for Af sinusoids, there are 2-I-I-I-3M param¬ 
eters for this model 9 = {cti, (T 27 A/,//, (/>/]/ = 

I, 2, • • • M}. The priors on the parameters are the 
same as in the previous models. This hybrid model 
allows us to effectively ‘subtract out’ any suffi¬ 
ciently bright sinusoidal signals in the data. 


We consider the following five models for describing 
the signal and noise: 

Ado - Noise-only model: 

This is a noise-only model, which assumes uncor¬ 
related, white Gaussian noise in the two detec¬ 
tors. There are only two parameters for this model, 
9 = {(Ti, cr 2 }- For our simulations, the prior on the 
noise variances are flat between 0 and 10 . 


Note that we do not consider a hybrid “noise plus 
deterministic multi-sinusoid plus non-Gaussian back¬ 
ground” model in the above list, as we expect the 
subtraction of the bright sinusoids to remove most of 
the non-Gaussianity of the signal component. Also, 
as we shall discuss further in Sec. |V] we do not con¬ 
sider a signal-|-noise model with a non-standardized 
Student’s t-distribution for the non-Gaussian stochas¬ 
tic gravitational-wave component. This is because of 
the computational costs associated with the marginalized 
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likelihood evaluations (see Eq. ([T^), which are needed for 
the Bayesian model selection calculations. 


V. SIMULATIONS 


A. Astrophysical source populations 


Simulated data is generated by co-adding sinusoidal 
signals with amplitudes drawn from one of three astro- 
physical models. The frequencies and phases of the sinu¬ 
soids are drawn uniformly from the prior ranges defined 
in the previous section. Gaussian-distributed noise with 
a white power spectrum is then added to the signals. 
The amplitude of the signals is scaled so as to produce 
a pre-specified matched-filter signal-to-noise ratio (SNR) 
per frequency bin, calculated as an average across all fre¬ 
quency bins, using Eq. (33). 


Figure is a plot of the squared amplitude of the noise 
and signal components for a typical simulation using two 
detectors. For an SNR-per-bin of 1, the amplitudes of the 
astrophysicals signals are of the same order-of-magnitude 
as the noise in the two detectors, as can be seen in the 
figure. 



FIG. 1: The squared amplitude of the noise and signal com¬ 
ponents for data in two coincident and coaligned detectors 
consisting of white noise and a superposition of sinusoids 
drawn from an astrophysical source population (signal model 
2), with a source density of 0.1/bin in 128 frequency bins and 
an average SNR-per-bin of 1. The SNR is dominated here by 
the four brightest sources. 


We considered three astrophysical source models: 
Model 0 uniformly distributes standard sirens (sources 
with the same intrinsic amplitude) in space out to some 
cutoff radius r = R, after which the density falls-off ex¬ 
ponential with an e-folding scale of 0.25i?. Model 1 dis¬ 
tributes standard sirens with a Gaussian distribution in 
distance with density p oc e“’’ . For model 0 the 

number of sources in a spherical shell of radius r is pro¬ 


portional to out to r = R. For model 1 the number 
of sources in a spherical shell of radius r is proportional 
to the product , and thus has a larger num¬ 

ber of sources at smaller r, as compared to the uniform 
distribution case. Model 2 is based on a population syn¬ 
thesis model for supermassive black hole binaries, where 
the amplitude of the sources depends on both the mass 
of the system and the distance. The usual frequency de¬ 
pendence of the amplitude was artificially suppressed so 
as to produce a white spectrum. 

The amplitude distributions for the three models are 
shown in Fig. Models 0 and 1 have similar amplitude 
distributions that are fairly tightly peaked, while model 
2 has a large tail extending to high amplitude. 



Amplitude 

FIG. 2: Amplitude distributions for the three astrophysical 
source distributions considered in this study. The amplitude 
scale is arbitrary since the signal-to-noise ratios are set when 
producing simulated data sets drawn from these distributions. 
Models 0 and 1 are for “standard sirens” (equal intrinsic 
amplitude sources) with different spatial distributions, while 
model 2 is based on a population synthesis model where some 
sources have much higher intrinsic amplitudes than others. 


B. Markov Chain Monte Carlo methods 

We performed two types of analyses, both of which em¬ 
ployed trans-dimensional Reversible Jump Markov Chain 
(RJMCMC) algorithms. The first type of analysis looked 
at the signal in a single detector with no instrument noise. 
There the goal was to find which of three statistical mod¬ 
els best described the intrinsic properties of the signal: a 
Gaussian distribution, a two-component Gaussian distri¬ 
bution; or a non-standardized Student’s t-distribution. A 
RJMCMC analysis extends the usual MCMC exploration 
of the parameters of a single model to the exploration of 
a range of models and their parameters, thus allowing us 
to produce marginalized posteriors for both the model 
parameters and posterior distributions for the relative 

























































probability that each model is consistent with the data 
and our prior knowledge. In principle, a single RJMCMC 
routine could explore all three probability distributions 
at once, but we were able to achieve better mixing by 
performing pair-wise comparisons between the Gaussian 
and two-component Gaussian models and the Gaussian 
and the non-standardized Student’s t model. The ratio 
of the number of iterations the Markov chain spends in 
each model yields Bayes Factors between the Gaussian 
reference model and the two non-Gaussian alternatives. 

The second type of analysis considered the detection 
and characterization of the astrophysical signals in the 
presence of detector noise in a two-detector network. 
Here we considered the five models described in Sec. El 
Once again, a single RJMCMC routine could simultane¬ 
ously explore all five models, but achieving efhcient mix¬ 
ing between models with different parameterizations and 
dimensionality is notoriously difficult. Instead, we again 
opted for a pairwise approach, comparing the noise-only 
model AJq to each of the four signal-|-noise models in 
turn. This yields a collection of Bayes factors between 
the reference noise model and the four signal models. It 
is important to note that models Ais and 7VI4 are both 
complicated composite models that allow for a variable 
number of sinusoids to be used in the model. Model AJ4 
further allows for, but does not require, a Gaussian sig¬ 
nal component. Thus AJ4 contains models AJ3, Ati and 
Aio as sub-cases. If we had included a two-component 
Gaussian in AJ4 then we would have been able to ex¬ 
plore all four signal models at once. We did check that 
the relative probabilities for the sub-models included in 
M 4 were consistent with the relative probabilities found 
in the pair-wise comparisons, though the larger model 
space did lead to larger uncertainties on the Bayes fac¬ 
tors. The uncertainties were computed from the variance 
of the running Bayes factors, and compared to analytic 
estimates based on the number of transitions between 
the models [14]. Both methods yielded consistent error 
estimates. We further checked that the error estimates 
were consistent with the spread seen when repeating the 
analysis dozens of times with different random number 
seeds. 


C. Classifying the signals 

We begin looking at the statistical properties of the 
signals themselves. While this is not something we can 
do with actual observations where the signals must be ex¬ 
tracted from noisy data, it is interesting to compare the 
intrinsic properties of the signals to the observed proper¬ 
ties of the signals. 

Figure]^ is a histogram of signal samples as well as the 
best fit Gaussian (“single gauss”), two-component Gaus¬ 
sian (“double gauss”), and non-standardized Student’s 
t-distribution for a simulated signal with an average den¬ 
sity of one source per ten frequency bins. As expected 
for such a sparse population, the single-Gaussian fit is ex- 



FIG. 3: Histogram of signal samples and the correspond¬ 
ing best fit single-Gaussian, double-Gaussian, and non- 
standardized Student’s t-distribution for a signal consisting 
of a superposition of sinusoids drawn from an astrophysical 
population, with a source density of 0.1/bin in 256 frequency 
bins and an SNR-per-bin of 1. 


tremely poor compared to the two-component Gaussian 
or non-standardized Student’s t-distibution fit. 

Figurej^is a plot of Bayes factor quantile intervals as a 
function of the total number of frequency bins, compar¬ 
ing the two-component Gaussian and non-standardized 
Student’s t-distribution models to the reference Gaussian 
model. The source density was set to 10/bin for these 
simulations. The two panels correspond to astrophysical 
source models 0 and 1. In the upper panel the astrophys¬ 
ical sources are drawn from source model 0. In the lower 
panel, the astrophysical sources are drawn from source 
model 1. There is no detector noise in these simulations. 
Note that the Bayes factors in the lower panel are shifted 
slightly higher relative to those in the upper panel, consis¬ 
tent with the expectation that the Gaussian-distributed 
astrophysical source population will tend to produce 
closer—and hence more-easily resolvable—sources. We 
see that at this relative high source density, we need a 
large amount of data (many frequency bins) to detect the 
subtle departure from Gaussianity. 

We should emphasize that the quantile intervals shown 
in Fig. 1^ (and in several other figures to follow) define the 
probability distribution for the Bayes factor values as es¬ 
timated from 256 independent realizations of the simu¬ 
lated signal and noise for each set of parameter values: 
these are not error bars on the individual Bayes faetors. 
For a single realization of the simulated signal and noise, 
the uncertainty in the value of the Bayes factor as esti¬ 
mated from 128 independent Monte Garlo simulations is 
< 10%, which we can ignore in the quantile plots. 

Figure is a similar plot of Bayes factor quantile 
intervals as a function of the number of sources per 
bin, comparing the two-component Gaussian and non- 
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number of frequency bins 



number of frequency bins 

FIG. 4: Bayes factor 80% quantile intervals for the two- 
component Gaussian and non-standardized Student’s t- 
distribution signal models as a function of the total number 
of frequency bins. The source density was set to 10/bin for all 
the simulations. In the upper panel the astrophysical sources 
are drawn from the source model 0. In the lower panel the 
astrophysical sources are drawn from source model 1. 


standardized Student’s t-distribution. The total number 
of bins was set to 128 and the signals were drawn from 
astrophysical source model 0. As expected from the cen¬ 
tral limit theorem, the simulated data is consistent with a 
Gaussian probability distribution when the average num¬ 
ber of signals per frequency bin is large. It is interesting 
to note the large spread in the Bayes factors in the tran¬ 
sition region between 1 and 10 sources per bin. This tells 
us that some realizations look Gaussian, while others look 
highly non-Gaussian. 

Similar to the results shown in Figure the non- 
standardized Student’s t-distribution model has consis¬ 
tently higher Bayes factors than the two-component 
Gaussian model. This suggests using it over the 
two-component Gaussian model when modeling non- 
Gaussian stochastic signals. However, the fact that 
we are not able to find an analytic expression for 
the corresponding marginalized likelihood function (see 
Eq. @) means that the Student’s t-distribution model 



number of sources per bin 

FIG. 5: Bayes factor 80% quantile intervals for the two- 
component Gaussian and non-standardized Student’s t- 
distribution signal models as a function of the total number 
of sources per bin. The total number of bins was set to 128 
and the astrophysical sources were drawn from source model 
0 . 


has a much higher computational cost than the two- 
component Gaussian model. As such, for all subse¬ 
quent model comparison simulations that we do—which 
include simulated noise in a two-detector network—we 
use the two-component Gaussian stochastic model in¬ 
stead of the more expensive non-standardized Student’s 
t-distribution model. 


D. Detecting and characterizing signals in noisy 
detector data 

Next we turn our attention to the observed properties 
of the signals in a more realistic setup that includes in¬ 
strument noise and a network with two co-aligned and 
co-located detectors. A multi-detector analysis is needed 
to distinguish signals from noise. In this study we need 
to consider the dependence on SNR, in addition to the 
dependence on source density and data volume (number 
of frequency bins). In the infinite SNR limit we recover 
the pure signal analysis described in the previous section. 
For more realistic SNRs, a signal that is best described 
as deterministic or non-Gaussian in the absence of noise 
may favor a stochastic or Gaussian description in the 
presence of noise when model simplicity wins out over 
model fidelity. 

In addition to the Gaussian and two-component Gaus¬ 
sian signal models {A4i and AI 2 ), we additionally con¬ 
sider a deterministic model made up of the sum of si¬ 
nusoids (Ala), and a hybrid model with a Gaussian- 
stochastic component and a collection of sinusoids (AI 4 ). 
Figure [^compares the cross-correlated data in two detec¬ 
tors to a marginalized posterior distribution for the fre¬ 
quencies used by the multi-component sinusoid model. In 
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this instance, the brightest sinusoid in the data was con¬ 
fidently detected, as indicated by the large peak in the 
posterior distribution. The second and third brightest 
signals in the data were marginally detected, as indicated 
by the secondary peaks in the posterior distribution. 



frequency (Hz) 

FIG. 6: The cross-correlated signal-l-noise in the two detectors 
for the simulated data shown in Figure 1 is compared to the 
scaled posterior density for the frequencies of the sinusoids 
found by a trans-dimensional MCMC analysis of the data. 
The three brightest signals in the data had amplitudes and 
frequencies {A = 4.46, / = 22.45), {A = 4.41, / = 110.66) 
and {A = 3.73, / = 39.23). Only the brightest of these was 
a clear detection, though the analysis did occasionally lock 
onto the other signals. 

The question of wether the data is best described by 
a deterministic model, a non-Gaussian stochastic model 
or a Gaussian-stochastic model depends on many factors, 
including the source density, the noise level, the number 
of frequency bins and the SNR-per-bin. In what follows 
we explore the impact of each of these factors. 

A major challenge that we face is that the outcomes 
vary greatly from one simulation to the next, especially in 
the limit that there are few sources and/or few frequency 
bins. To counter this we performed a large number of 
simulations and aggregated the results. When show¬ 
ing Bayes factors between the various signal models and 
the noise model, we display the mean values along with 
the 80% quantile intervals derived from the ensemble of 
simulations. More directly, we also report the fraction 
of times each model had the highest Bayesian evidence 
on a realization-by-realization basis. While the general 
trend is that the Gaussian model is more likely to be 
favored as the number of sources per bin increases, the 
deterministic model can sometimes be preferred at high 
source density, and the Gaussian-stochastic model can 
sometimes be preferred at low source density. Note that 
while ground and space based interferometric detectors 
and pulsar timing arrays nominally cover much larger fre¬ 
quency bands than we consider here, their “V” shaped 
sensitivity curves limit the effective number of frequency 



number of sources per bin 



FIG. 7: The fraction of times a non-Gaussian model has 
higher evidence than the Gaussian model as a function of the 
source density for the three source models. For these simula¬ 
tions the number of bins was fixed at 32 and the average SNR- 
per-bin was fixed at 2. Upper panel: The fraction of times 
the deterministic, multi-sinusoid model Ala had higher evi¬ 
dence than the Gaussian-stochastic model A4i. Lower panel: 
The fraction of times the stochastic two-component Gaussian 
model M 2 had higher evidence than the Gaussian-stochastic 
model Ml- 

bins to lOO’s for interferometers and lO’s for pulsar tim¬ 
ing arrays. 

Figure shows the fraction of times that a non- 
Gaussian model has higher evidence than the Gaussian 
model as a function of the source density for the three 
source models. Here the the number of bins was fixed at 
32 and the average SNR-per-bin was fixed at 2. The gen¬ 
eral trend is as expected from the central limit theorem— 
as the number of signals per frequency bin grows the data 
looks less deterministic and more Gaussian. The more 
realistic source model 2, which has a variety of intrinsic 
source luminosities, was consistently less Gaussian than 
source models 0 and 1 which assumed equal luminosity 
sources. Even at high source densities model 2 could 
appear deterministic or non-Gaussian. 

Figure extends the study of the dependence on the 
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FIG. 8: Upper panel: Bayes factor 80% quantile intervals for 
the four different signal+noise models relative to the noise- 
only model as a function of the number of sources per bin. The 
total number of bins was set to 32 for these simulations, and 
the astrophysical sources were drawn from source model 2. 
The SNR-per-bin was fixed at 2, with different realizations of 
noise used for the different simulations. Lower panel: Fraction 
of time that the different models had the largest Bayes factor 
for the different simulations. 


FIG. 9: Upper panel: Bayes factor 80% quantile intervals for 
the four different signal-l-noise models relative to the noise- 
only model as a function of the total number of frequency bins. 
The source density was set to 1 /bin for all the simulations, and 
the astrophysical sources were drawn from source model 2. 
The SNR-per-bin was fixed at 2, with different realizations of 
noise used for the different simulations. Lower panel: Fraction 
of time that the different models had the largest Bayes factor 
for the different simulations. 


source density to include the full set of signal models, 
and in addition to showing the fraction of time that each 
model is favored, also shows the Bayes factor quantile 
intervals for the four signal models. The total number of 
bins was set to 32 for these simulations, which included 
simulated noise in addition to the simulated astrophysi¬ 
cal signals from source model 2. For these simulation the 
SNR-per-bin was fixed at 2, with different realizations of 
the noise used for the different simulations. Note that 
for low source densities, the models that include deter¬ 
ministic sinusoid signals are the preferred models. The 
effectiveness of models having Gaussian or non-Gaussian 
stochastic signal components improve as the source den¬ 
sity increases. As expected, the hybrid model performs 
best for all source densities. 

Figure shows how the model selection results are 
affected by the number of frequency bins, keeping the 


source density fixed at one-per-bin and the SNR-per-bin 
fixed at 2. As the number of frequency bins increases, the 
chances of having one or two loud sources dominate the 
total signal increases, and consequently, the determinis¬ 
tic multi-sinudoid model and the two-component Gaus¬ 
sian stochastic models are more likely to outperform the 
Gaussian model. 

Finally, Figure shows how the model selection is 
affected by the SNR-per-bin, keeping the source density 
fixed at one-per-bin and the number of bins fixed at 32. 
The Bayes factors for all the signal-l-noise models increase 
quadratically with increasing SNR-per-bin, which is to be 
expected when compared to the noise-only model. For 
sufficiently large SNR-per-bin (so that the signal is de¬ 
tected by all the models), the relative performance of the 
various signal models is independent of the SNR-per-bin. 
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snr per bin 



FIG. 10: Upper panel: Bayes factor 80% quantile intervals for 
the four different signal+noise models relative to the noise- 
only model as a function of SNR-per-bin. The total number 
of bins was set to 32 and the source density to 1/bin for 
all the simulations. The astrophysical sources were drawn 
from source model 2. Lower panel: Fraction of time that the 
different models had the largest Bayes factor for the different 
simulations. 


VI. DISCUSSION 


We presented a Bayesian search for non-Gaussian 
gravitational-wave backgrounds. We found that a grav¬ 
itational wave signal comprised of the sum of discrete 
sources drawn from some astrophysical population may 
be best described as either deterministic, non-Gaussian 
stochastic, or Gaussian-stochastic depending on the num¬ 
ber of sources, and the size of the data set. In our stud¬ 
ies the simulated data were produced by adding together 
multiple sinusoids with amplitudes drawn from one of 
three astrophysical source distributions, to which was 
added an independent white noise realization in each de¬ 
tector. While the deterministic signal model AI 3 , made 
up of a variable number of sinusoids, is able to precisely 
match the simulated data, the simpler Gaussian j\4 1 and 
non-Gaussian AI 2 stochastic signal models are often pre¬ 


ferred. The general trend follows our expectation that 
the signals appear increasingly stochastic and Gaussian 
as the number of sources per frequency bin increases. We 
found that departures from Gaussianity are more likely 
to be detected in large data sets (in our case, for large 
numbers of frequency bins), but that the ability to distin¬ 
guish between the various signal models was independent 
of the SNR (once the signal became detectable). In all 
cases, a hybrid model, A^ 4 , that combines variable con¬ 
tributions from deterministic and stochastic signals that 
are determined by the data, outperformed each of the sin¬ 
gle element models most of the time. This finding may be 
of particular relevance to the detection of low frequency 
gravitational waves by pulsar timing arrays. 

Although for simplicity we considered co-located 
and co-aligned detectors and white power spectra, the 
method can easily be extended to handle more realistic 
detector geometry (i.e., separated and misaligned detec¬ 
tors) as well as colored spectra. One can also consider 
additional signal and noise models. 
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Appendix A: Bayes factor calculation 


Suppose we have two models that we would like to 
compare, denoted A4i and A4o, with p arame ters and 
6 * 0 , respectively. As described in Sec. 


me 


the Bayes 

factor Sio(s) for model A4i relative to model A4o given 
observed data s is defined by 


^io(s) — 


p(s|A^i) 
p(s|Afo) ’ 


(Al) 


where 


p(s|Al) = J dO p{s\9, M)Tr{0\M) (A 2 ) 

for either model (i.e., M = A4o or A4i). The quantity 
p(s|Al) is called the evidence for model Al. The Bayes 
factor is the ratio of the evidences for the two models; it 
equals the posterior odds ratio for the two models if they 
have equal a priori probabilities. 

Since analytic or direct calculations of the evidence 
integrals is usually not possible, we need to estimate the 
Bayes factor numerically. In this appendix, we describe 
a few of the methods used for the study described in 
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this paper. Readers interested in more details should see 
Sec. II of [23]. 


1. Laplace approximation 


It has logarithm 


2 InSio(s) ~ 2 In 




(AlO) 


If we assume that the data is informative, so that the 
likelihood function is peaked relative to the prior proba¬ 
bility distribution, then we can use the L apla ce approxi¬ 
mation to estimate the evidence integral (A2): 


p(s|AI) :^p{s\0Mh,M) 


AVm 

Vm 


(A3) 


where 0 ml = 0 ml(s) maximizes the likelihood p{s\9,M) 
with respect to variations of 0; AVm is the characteristic 
width of the likehood function around its maximum; and 
Vm is the total parameter space volume for the model 
parameters. The ratio AVmI^m can be thought of as an 
Occam’s factor, which penalizes a model if its parameter 
space volume is larger than needed to fit the data. Doing 
this calculation for both AIq and AIi, and then taking 
the ratio of the two results, we find 


^10 (s) 


p(s|0l,ML,>tl) ADi/IA 

p(s|0o,ML, AIo) ADo/Vb 


Aml(s) 


ARi/Ri 

ARo/K) ’ 


(A4) 

(A5) 


Since jN is the variance of the sample mean s 

(or, equivalently, it is the characteristic width of the like¬ 
lihood function around its maximum value), we see that 
twice the log of the Bayes factor is effectively the squared 
SNR of the maximum-likelihood estimator 0 ml (s). The 
first term on the right-hand side of Eq. (AlO) is the Oc¬ 
cam’s penalty factor associated with size of the parameter 
space volume 0max- This term gets smaller as the num¬ 
ber of data points JV increases since a decreases with 
increasing N. 


2. Savage-Dicke density ratio 

The Savage-Dicke density ratio can be defined when¬ 
ever model Ado is a subset of model Adi, and the prior 
probabilities factorize. Both of these conditions hold, for 
example, if 0i = {0o,0extra}, with 

^(0-i|Adi) = ^(0o|Ado)7r(0';xtra|Adi) (All) 


and 


where Aml(s) is the maximum-likelihood ratio. 

As a very simple example, consider the case of N sam¬ 
ples of data s, consisting of a unknown constant signal 
in additive white Gaussian-stationary noise with known 
variance a. Let Ado denote the noise-only model with 
likelihood function 


p(s|Mo) = n^^e-*^/2‘^% (A6) 

i=l V 

and let Adi be the signal-\-noise model defined by the 
likelihood function 


N 


p(s|0,Adi) = 


2=1 


V^TTO^ 




(A7) 


and prior 7r(0) = l/0max, where 0 € [O,0niax] is the un¬ 
known signal amplitude. Then one can easily show that 
the maximum-likelihood parameter value is the sample 
mean 


N 


0ml(s) = Si = 


(AS) 


and the Bayes factor for the signal-)-noise model Ad i rel¬ 
ative to the noise-only model Ado is: 


a/^/N 

Oio(s) ^ -T-exp 


1 


2cr2/iV_ ■ 


(A9) 


p(s|0o,Ado) =p(s|0i,Adi)|g._^^^^^^g-^^^^^_^ (A12) 

for some fixed set of parameter values 0extra,o- The 

Savage-Dicke density ratio rio(s) is then defined as 

1 _ 7J'(0extra,o|Adi) / A 1 

^io{s) = - 1 , . ' > (A13) 

P(^extra,0 I®: J^l ) 

where p(0extra,o|s, Adi) is the marginalized probability 
density function 

p{e 

extra |s,Adi)= J d9op{0o 

; ^extra |s,Adi) (A14) 


evaluated at 0extra = 0extra,o- Using Bayes’ theorem in 
the form 


^extra|®5 Adi) 


P(®l^07 ^extra; Adi)7r(0o, ^extra |Adi) 


p(s|Adi) 


(A15) 


and Eqs. (All) and (A12), one can show that 


rio(s) = 6io(s) 


(A16) 


exactly. The advantage of using the expression for the 
Savage-Dicke density ratio to estimate the Bayes factor 
is that it only requires exploration of the posterior dis¬ 
tribution for model Adi. 
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3. Reversible jump MCMC 


Reversible jump, or trans-dimensional MCMC algo¬ 
rithms, explore the space of models in addition to the 
parameters of each model. The Bayes factor between 
two models A4o and A4i is simply estimated from the 
ratio of the number of iterations that the chain spends 
in each model: 


Bio{s) 


number of interations in model Af i 
number of interations in model A4o 


(A17) 


The accuracy of the estimate depends on the number 
of transitions between the two models—the more tran¬ 
sitions, the more accurate the estimate. The problem 
with this simple approach is that it becomes difficult to 
compute Bayes factors smaller than 10“^ or larger than 
10 ^, since the chains spend very little time in the dis¬ 
favoured model, and hence the exploration of that model 
can fail to converge to the stationary state. Ideally, we 


would like the chain to spend an equal amount of time in 
each model, so that all models are explored equally well 
(that is, assuming each model has a comparable dimen¬ 
sionality; if the model dimensions are significantly differ¬ 
ent, more time should be spent exploring in the higher¬ 
dimensional model). 


To achieve good mixing within each model and be¬ 
tween models, we introduce an artificial prior weighting 
on the models that compensates for the difference in the 
Bayes factors. For example, if the Bayes factor between 
two models is 1000, we introduce a prior that favors the 
low probability model by a factor of 1000, so the chains 
spends an equal number iterations in each model [24| . 
Since the appropriate weighting is not known in advance, 
an iterative scheme is used that adjusts the artificial prior 
weighing on the models until balance is achieved. The 
true Bayes factors are then found from the iteration ra¬ 
tio divided by the artificial prior odds ratio. 
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